Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon

Identifieur interne : 003186 ( Main/Exploration ); précédent : 003185; suivant : 003187

A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon

Auteurs : I. Ben Cheikh [Tunisie] ; A. Kacem [Tunisie] ; Abdel Belaïd [France]

Source :

RBID : Pascal:10-0429694

Descripteurs français

English descriptors

Abstract

Recently, we have investigated the use of Arabic linguistic knowledge to improve the recognition of wide Arabic word lexicon. A neural-linguistic approach was proposed to mainly deal with canonical vocabulary of decomposable words derived from tri-consonant healthy roots. The basic idea is to factorize words by their roots and schemes. In this direction, we conceived two neural networks TNN_R and TNN_S to respectively recognize roots and schemes from structural primitives of words. The proposal approach achieved promising results. In this paper, we will focus on how to reach better results in terms of accuracy and recognition rate. Current improvements concern especially the training stage. It is about 1) to benefit from word letters order 2) to consider "sisters letters" (letters having same features), 3) to supervise networks behaviors, 4) to split up neurons to save letter occurrences and 5) to solve observed ambiguities. Considering theses improvements, experiments carried on 1500 sized vocabulary show a significant enhancement: TNN_R (resp. TNN_S) top4 has gone up from 77% to 85.8% (resp. from 65% to 97.9%). Enlarging the vocabulary from 1000 to 1700, adding 100 words each time, again confirmed the results without altering the networks stability.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon</title>
<author>
<name sortKey="Ben Cheikh, I" sort="Ben Cheikh, I" uniqKey="Ben Cheikh I" first="I." last="Ben Cheikh">I. Ben Cheikh</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>UTIC-ESSTT, 5 Avenue Taha Hussein, BP56 Bab Menara</s1>
<s2>1008 Tunis</s2>
<s3>TUN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Tunisie</country>
<placeName>
<settlement type="city">Tunis</settlement>
<region nuts="2">Gouvernorat de Tunis</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Kacem, A" sort="Kacem, A" uniqKey="Kacem A" first="A." last="Kacem">A. Kacem</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>UTIC-ESSTT, 5 Avenue Taha Hussein, BP56 Bab Menara</s1>
<s2>1008 Tunis</s2>
<s3>TUN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Tunisie</country>
<placeName>
<settlement type="city">Tunis</settlement>
<region nuts="2">Gouvernorat de Tunis</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Belaid, A" sort="Belaid, A" uniqKey="Belaid A" first="A." last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>LORIA, Campus scientifique, B.P. 239</s1>
<s2>54606 Vandœuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0429694</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0429694 INIST</idno>
<idno type="RBID">Pascal:10-0429694</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000204</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000815</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000185</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000185</idno>
<idno type="wicri:doubleKey">0277-786X:2010:Ben Cheikh I:a:neural:linguistic</idno>
<idno type="wicri:Area/Main/Merge">003248</idno>
<idno type="wicri:Area/Main/Curation">003186</idno>
<idno type="wicri:Area/Main/Exploration">003186</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon</title>
<author>
<name sortKey="Ben Cheikh, I" sort="Ben Cheikh, I" uniqKey="Ben Cheikh I" first="I." last="Ben Cheikh">I. Ben Cheikh</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>UTIC-ESSTT, 5 Avenue Taha Hussein, BP56 Bab Menara</s1>
<s2>1008 Tunis</s2>
<s3>TUN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Tunisie</country>
<placeName>
<settlement type="city">Tunis</settlement>
<region nuts="2">Gouvernorat de Tunis</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Kacem, A" sort="Kacem, A" uniqKey="Kacem A" first="A." last="Kacem">A. Kacem</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>UTIC-ESSTT, 5 Avenue Taha Hussein, BP56 Bab Menara</s1>
<s2>1008 Tunis</s2>
<s3>TUN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Tunisie</country>
<placeName>
<settlement type="city">Tunis</settlement>
<region nuts="2">Gouvernorat de Tunis</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Belaid, A" sort="Belaid, A" uniqKey="Belaid A" first="A." last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>LORIA, Campus scientifique, B.P. 239</s1>
<s2>54606 Vandœuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Vandœuvre-lès-Nancy</settlement>
</placeName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Accuracy</term>
<term>Arabic</term>
<term>Consonants</term>
<term>Document retrieval</term>
<term>Learning</term>
<term>Lexicon</term>
<term>Neural networks</term>
<term>Pattern recognition</term>
<term>Vocabulary</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Réseau neuronal</term>
<term>Reconnaissance forme</term>
<term>Recherche documentaire</term>
<term>Arabe</term>
<term>Lexique</term>
<term>Vocabulaire</term>
<term>Consonne</term>
<term>Précision</term>
<term>Apprentissage</term>
<term>0130C</term>
<term>0705M</term>
<term>4230S</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Recently, we have investigated the use of Arabic linguistic knowledge to improve the recognition of wide Arabic word lexicon. A neural-linguistic approach was proposed to mainly deal with canonical vocabulary of decomposable words derived from tri-consonant healthy roots. The basic idea is to factorize words by their roots and schemes. In this direction, we conceived two neural networks TNN_R and TNN_S to respectively recognize roots and schemes from structural primitives of words. The proposal approach achieved promising results. In this paper, we will focus on how to reach better results in terms of accuracy and recognition rate. Current improvements concern especially the training stage. It is about 1) to benefit from word letters order 2) to consider "sisters letters" (letters having same features), 3) to supervise networks behaviors, 4) to split up neurons to save letter occurrences and 5) to solve observed ambiguities. Considering theses improvements, experiments carried on 1500 sized vocabulary show a significant enhancement: TNN_R (resp. TNN_S) top4 has gone up from 77% to 85.8% (resp. from 65% to 97.9%). Enlarging the vocabulary from 1000 to 1700, adding 100 words each time, again confirmed the results without altering the networks stability.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>Tunisie</li>
</country>
<region>
<li>Gouvernorat de Tunis</li>
<li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement>
<li>Nancy</li>
<li>Tunis</li>
<li>Vandœuvre-lès-Nancy</li>
</settlement>
<orgName>
<li>Centre national de la recherche scientifique</li>
<li>Institut national de recherche en informatique et en automatique</li>
<li>Laboratoire lorrain de recherche en informatique et ses applications</li>
<li>Université de Lorraine</li>
</orgName>
</list>
<tree>
<country name="Tunisie">
<region name="Gouvernorat de Tunis">
<name sortKey="Ben Cheikh, I" sort="Ben Cheikh, I" uniqKey="Ben Cheikh I" first="I." last="Ben Cheikh">I. Ben Cheikh</name>
</region>
<name sortKey="Kacem, A" sort="Kacem, A" uniqKey="Kacem A" first="A." last="Kacem">A. Kacem</name>
</country>
<country name="France">
<region name="Grand Est">
<name sortKey="Belaid, A" sort="Belaid, A" uniqKey="Belaid A" first="A." last="Belaïd">Abdel Belaïd</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003186 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 003186 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:10-0429694
   |texte=   A Neural-Linguistic Approach for the Recognition of a Wide Arabic Word Lexicon
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022